Printing Structured Text without Stylesheets?

نویسندگان

  • Helena Ahonen
  • Barbara Heikkinen
  • Oskari Heinonen
  • Mika Klemettinen
چکیده

As more and more XML documents start to appear, e.g. on the WWW, the users face a new problem: opposite to HTML tags, XML tags do not tell the semantics of a structure element. This means that if a document does not come with layout, e.g. XSL or CSS, specifications, it is not easy to say how the document should be formatted for presentation in print or on screen. In this paper we describe a tool and a process with which a document without any stylesheets or styling information can be automatically transformed to be viewed with different target media (paper, WWW browser, WAP phone, etc.). Our approach is based on DTD generalization and element mapping, and transformation and styling with XSLT. We give hands-on examples of the automatic transformation process pipeline from the original stylesheetless document to the transformed result document with media-specific styling, and show that the approach works in practice. Prof. Helena Ahonen-Myka, Ph.D. Mika Klemettinen, Ph.D. Barbara Heikkinen, and M.Sc. Oskari Heinonen have their backgrounds in structured documents and data mining. AhonenMyka got her Ph.D. in 1996 with a thesis titled “Generating grammars for structured documents using grammatical inference methods”. Klemettinen obtained his Ph.D. in 1999; his thesis was about knowledge discovery in telecom area. Heikkinen got her Ph.D. in 2000 with a thesis titled “Generalization of Document Structures and Document Assembly”. Heinonen is finishing his Ph.D. studies. All the authors except Heikkinen work currently at the Department of Computer Science at the University ofHelsinki; Heikkinen is with the Nokia Research

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CTS: An Interactive Technique for Manipulating Structured Text

This paper describes Complex Textual Strings (CTS), a technique for manipulating structured text strings and an underlying application data structure by creating a two-way mapping between the text and the data structure. An editable specification, called unification grammars describes ~e structure and data mapping characteristics of a particular Instance of CTS. We also describe an interpreter ...

متن کامل

Usage of XSL Stylesheets for the Annotation of the Sámi Language Corpora

This paper describes an annotation system for Sámi language corpora, which consists of structured, running texts. The annotation of the texts is fully automatic, starting from the original documents in different formats. The texts are first extracted from the original documents preserving the original structural markup. The markup is enhanced by a document-specific XSLT script which contains do...

متن کامل

Survey of Global Regular Expression Print ( GREP ) Tools

The UNIX grep utility marked the birth of a global regular expression print (GREP) tools. Searching for patterns in text is important operation in a number of domains, including program comprehension and software maintenance, structured text databases, indexing file systems, and searching natural language texts. Such a wide range of uses inspired the development of variations of the original UN...

متن کامل

Supervised Text Region Identification on Historical Documents

We present multi-column text region identification support for Ocular, the unsupervised historical printed document transcription project of Berg-Kirkpatrick et. al (2013). We use structured prediction with rich features defined on the input document and incorporate a transition model based on prior document layout assumptions. Our model is trained using a structured-SVM objective on a randomly...

متن کامل

An XML model of CSS3 as an XL ATEX-TEXML-HTML5 stylesheet language

HTML5 [1] and CSS3 [2] are popular languages for Web development. However, HTML with CSS is prone to errors and difficult to port, so we propose an XML version of CSS that can be used as a standard for creating stylesheets and templates across different platforms and pagination systems. XLTEX [3] and TEXML [4] are some examples of XML that are close in spirit to TEX that can benefit from such a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000